Fast Randomized Semi-Supervised Clustering

نویسندگان

  • Alaa Saade
  • Florent Krzakala
  • Marc Lelarge
  • Lenka Zdeborová
چکیده

We consider the problem of clustering partially labeled data from a minimal number of randomly chosen pairwise comparisons between the items. We introduce an efficient local algorithm based on a power iteration of the non-backtracking operator and study its performance on a simple model. For the case of two clusters, we give bounds on the classification error and show that a small error can be achieved from O(n) randomly chosen measurements, where n is the number of items in the dataset. Our algorithm is therefore efficient both in terms of time and space complexities. We also investigate numerically the performance of the algorithm on synthetic and real world data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering

Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...

متن کامل

Semi Supervised Image Segmentation by Optimal Color Seed Selection using Fast Genetic Algorithm

Key factors like similarity, proximity, and good Many researchers have mentioned the significance of perceptual grouping and organization in vision and listed various continuation that guide to visual grouping of image. However, even to the present situation, many of the computational factors of perceptual grouping have remained unanswered. As there are several probable partitions of the domain...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

Fast Krylov Methods for Clustering

At the heart of unsupervised clustering and semi-supervised clustering is the calculation of matrix eigenvalues(eigenvectors) or matrix inversion. In generally, its complexity is O(N). By using Krylov Subspace Methods and Fast Methods, we improve the performance to O(NlogN). We also make a thorough evaluation of errors introduced by the fast algorithm.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1605.06422  شماره 

صفحات  -

تاریخ انتشار 2016